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This paper is concerned with the problem of implementing an unbounded timestamp object from multi- 
writer atomic registers, in an asynchronous distributed system of n processes with distinct identifiers where 
timestamps are taken from an arbitrary universe. Ellen, Fatourou and Ruppert (Ellen et al. 2008 1 showed 
that i/n/2 — 0(1) registers are required for any obstruction-free implementation of long-lived timestamp 
systems from atomic registers (meaning processes can repeatedly get timestamps). 

We improve this existing lower bound in two ways. First we establish a lower bound of n/6 — 1 registers 
for the obstruction-free long-lived timestamp problem. Previous such linear lower bounds were only known 
for constrained versions of the timestamp problem. This bound is asymptotically tight; Ellen, Fatourou 
and Ruppert [Ell en et al. 2008) constructed a wait-free algorithm that uses n — 1 registers. Second we 
show that v^2n — logn — 0(1) registers are required for any obstruction-free implementation of one-shot 
timestamp systems (meaning each process can get a timestamp at most once). We show that this bound 
is also asymptotically tight by providing a wait-free one-shot timestamp system that uses at most [2y^] 
registers, thus establishing a space complexity gap between one-shot and long-lived timestamp systems. 

Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnu- 
merical Algorithms and Problems; D.1.3 [Programming Techniques]: Concurrent Programming — Dis- 
tributed programming 

General Terms: Algorithms, Theory 

Additional Key Words and Phrases: Timestamps, Solo-termination, Wait-free, Obstruction-free, Space Com- 
plexity, Shared Memory 

1. INTRODUCTION 

In asynchronous multiprocessor algorithms, processes have no information about the 
real-time order of events that are incurred by other processes. In order to solve dis- 
tributed problems effectively, such as ensuring first-come-first-served fairness, or con- 
structing synchronization primitives, it is often necessary that some reliable informa- 
tion about the relative order of these events can be gained. 

Timestamp objects provide a means for processes to label events and then later com- 
pare those labels in order to gain information about the real-time order in which the 
corresponding events have occurred. Such timestamping mechanisms have been used 
to solve numerous problems associated with asynchrony in distributed shared mem- 
ory and message passing algorithms. Examples of applications include mutual and 
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fc-exclusion algori thms | Lamport 1974 Ricart and A grawala 1981 Fischer et al. 1989 
Afek et al. 1994), consensus algorithms I'Abrahamson 19881, register constructions 



jHaldar and Vi tanyi 2002; Li et al. 1996; Vitanyi and Awerbuch 19861, or adaptive 
renaming algorithms [Attiya and Fouren 2003 1. 

In 1978, Lamport [Lamport 1978] defined the "happens before" relation on events 
occurring in message passing systems to reflect the causal relationship of events. The 
happens before relation is a partial order, where, informally, an event ei happens be- 
fore event e-i, if there is a causal relation that forces event ei to precede 62. Lamport 
further devised a logical clock that assigns an integer value C(e), called a timestamp, 
to each event e such that C(ei) < C(e2) if event ei happens before event 62. Lamport's 
logical clock syst em based on integers was extended to clocks b ased on vectors (ex- 
amples include [Fidge 19881 and [ Mattern 1989| ) and matrices (| |Wuu and Bernstein| 
[I986:i and [Sarin and Lynch 1987|). 

In shared memory systems, events correspond to method invocations and responses. 
The happens before relation orders time intervals associated with method calls. 
Method call nii happens before method call m2, if the response of mi precedes the 
invocation of 7712. Timestamp objects provide a mechanism to label events with times- 
tamps from a timestamp universe T through getTSO (sometimes called timestamping 
or label) method calls. If T is finite, then the timestamp object is said to be bounded, 
otherwise it is unbounded. 

Often, T is a partially ordered set, and all timestamps returned by getTSO method 
calls during an execution preserve the happens before relation of these method calls. 
Such timestamp objects are called static. Non-static timestamp objects can take the 
current system state into account when comparing the order of two timestamps. Thus, 
different executions can lead to different partial orders of the set T. Sometimes, in 
particular when T is bounded, the happens before relation is only preserved for a sub- 
set of valid timestamps in T, e.g., the set of the last timestamps obtained by each 
process. In this case, timestamp objects often provide a scan method that returns an 



structions of bounded and unbounded timesta 
1992; Israeli and Pinhasov 1992; Israeli and 
and Waarts 1999; Haldar and Vitanyi 2002 


mp objects PLamport 1974*iGawlick et al. 
Li 1993; Dolev and Shavit 1997; DworE 
Attiya and Fouren 2003 Guerraoui ancT 


Ruppert 2007,. Ellen et al. 2008J. 



ters needed to implement timestamp objects. In order to prove strong lower bounds, the 
authors considered a very weak definition of an unbounded non-static timestamp ob- 
ject, that, in addition to getTSO provides a method compare (ii, i2) for two timestamps 
ti,t2 e T. The only requirement is that if a getTSO method gi that returns ti happens 
before another getTSO method 52 that returns ^2 then any later compare (^1,^2) must 
return true and any later compare (t2, ti) must return false. 

As their main result, Ellen et al. showed that any implementation that satisfies 
non-deterministic solo-termination (a progress condition weaker than wait-freedom or 
obstruction-freedom, and that is defined in Section |2jl requires at least i\/ri^n! regis- 
ters, where n is the number of processes in the system. Despite the weak requirements, 
the best known algorithm (also in [Ellen et al. 20081) needs n — 1 registers, leaving a 
large gap between the best known lower and upper bounds. However, for two stronger 
versions of the problem, Ellen et al. obtain tight lower bounds, showing that n reg- 
isters are necessary, first, for static algorithms, where T is nowhere dense (i.e., any 
two elements x,y € T satisfy \{z e T\x < z < y}\ < 00), and second, for anonymous 
algorithms. 
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Our Contributions. We distinguish between one-shot timestamp objects, where each 
process is allowed to call getTSO at most once, and long-lived ones, where each pro- 
cess can call getTSO arbitrarily many times. (In either case, the number of compare 
methods calls is not restricted.) We first improve the fl{^/n) lower bound of |Ellen et aT] 
[2(308J for long-lived timestamp objects to an asymptotically tight one: 

Theorem l.l. Any long-lived unbounded timestamp object that satisfies non- 
deterministic solo-termination uses at least n/6 — 1 registers. 

Therefore, even under very weak assumptions, at least linear register space is neces- 
sary. Since it is not possible to implement general timestamp objects using sublinear 
space, it makes sense to look at restricted solutions. 

Several methods have solutions that are simpler than the general case, if each pro- 
cess is allowed to execute it only once. Examples are renaming and mutual exclusion 
algorithms, splitter or snapshot objects, or agreement problems. Other problems, such 
as consensus or non-resettable test and set objects are inherently "one-time". It is con- 
ceivable that if an implementation of such an algorithm uses timestamp objects, then 
in the "one-shot" version of that algorithm each process needs to obtain a timestamp 
only once. Therefore, we study the space complexity of one-shot timestamp objects: 

Theorem 1.2. Any one-shot unbounded timestamp object that satisfies non- 
deterministic solo-termination uses at least ^/2n. — logn — 0(1) registers. 

This one-shot lower bound is a factor of approximately 2^/2 larger than the previ- 
ous best known lower bound for the long-lived case [El len et al. 2008) , and holds for 
historyless objects as well as registers as explained later 

Theorem 1.3. There is a wait-free implementation of one-shot timestamp objects 
that uses 2 \y/n] registers. 

Our lowe r bound proofs are base d on covering arguments (as introduced by Burns 
and Lynch | |Burns and Lynch 1993| ), where one constructs an execution in which pro- 
cesses are poised to write to some registers (the processes are said to cover these reg- 
isters). We rely on a lemma by Ellen, Fatourou and Ruppert [Ellen et al. 20081 that 
shows how in a situation where some processes cover a set R of registers, other pro- 
cesses can be forced to write outside of R. In order to obtain our improved lower bound 
for the long-lived case, we look at very long executions in which "similar" coverings are 
obtained over and over again. Our lower bound proof for the one-shot case is inspired 
by a geometric interpretation of the covering structure of configurations. The one-shot 
timestamps upper bound exploits the structure exposed by the lower bound. 

2. PRELIMINARIES 

We consider an asynchronous shared memory system with a set V = {pi, . . . ,p„} of 
n processes and a set TZ = {ri, . . . , r,„} of m registers that support atomic read and 
write operations. Processes can only communicate via those operations on shared reg- 
isters. We assume that processes can make arbitrary non-deterministic decisions, but 
we require that the result of any execution is correct, meaning that the responses from 
method calls match the specification of timestamp objects. 

A configuration C is a tuple (si, . . . , s„, wi, . . . , denoting that process Pi, I < i < 
n, is in state s;, and register ? , 1 < j < m, has value Vj. Configurations will be denoted 
by capital letters, and the initial configuration is denoted Co. 

An implementation of a method satisfies non-deterministic solo-termination, if for 
any configuration C and any process pi, I < i < n, there is an execution in which 
no process other than takes any steps, and pi finishes its method call within a fi- 
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nite number of steps | Fich et al. 1998 1 . Hence, a process is guaranteed to finish its 
method call with positive probability, whenever there is no interference from other pro- 
cesses. For deterministic algorithms, non-deterministic solo-termination is the same 
as obstruction-freedom and weaker than wait-freedom. Both our lower bound results 
hold for timestamp objects that satisfy this progress condition, our algorithm, however, 
satisfies the stronger wait-free progress property. 

A schedule cr is a (possibly infinite) sequence of process indices. An execution (C; a) is 
a sequence of steps beginning in configuration C and moving through successive con- 
figurations one at a time. At each step, the next process pi indicated in the schedule a, 
takes the next step in its program. Since our computation model is non-deterministic, 
we fix the non-deterministic decision made by Pi in our lower bound proofs. We use an 
arbitrary (but fixed) one that guarantees that terminates within a bounded num- 
ber of steps if it executes alone. If o- is a finite schedule, the final configuration of the 
execution (C;(t) is denoted o-(C). If a and tt are finite schedules then (ttt denotes the 
concatenation of a and tt. Let P be a set of processes, and a a schedule. If only indices 
of processes in P appear in a, then cr is a P-only schedule and any execution (C; a) is a 
P-only execution. If |F| = 1, a F-only schedule cr is a solo schedule and any execution 
(C; cr) is a solo execution. 

A configuration, C, is reachable if there exists a finite schedule, a, such that cr(Co) = 

C. 

Any execution (C; cr) defines a partial happens before order on the method calls 
that occur during (C; cr). A method call mi happens before m2, denoted mi to2, if the 
response of nii occurs before the invocation of 1112- 

An unbounded timestamp object supports two methods, getTSO and compare (). The 
first one outputs a timestamp without receiving any input; the compare method re- 
ceives any two timestamps as inputs, and returns true or false. If two getTSO in- 
stances gi and .92 return ti and t2, respectively, and gi g2, then compare (ti, t2) re- 
turns true and compare (t2,ti) returns false. 

A timestamp object is long-lived, if each process is allowed to invoke getTSO multi- 
ple times; it is one-shot when each process is allowed to invoke getTSO only once. 

Our lower bounds are based on covering arguments. We will construct executions, 
at the end of which processes are poised to write, i.e., they cover several registers. If 
other process are scheduled after this and if they write only to the same set of reg- 
isters, their trace can be eliminated. More precisely, we say process pi covers register 
Tj in a configuration C, if there is a non-deterministic decision such that the one step 
execution (C; (i)) is a write to register Vj. A set of processes P covers a set of registers 
R if for every register r £ R there is a process p <E P such that p covers r. 

For a process set P, let up denote an arbitrary (but fixed) permutation of P (for 
example the one that orders processes by their ID). If the process set P covers the 
register set R in configuration C, the information held in the registers in R can be 
overwritten by letting all processes in P execute exactly one step. Such an execution 
by the processes in P is called a block-write. More precisely, a block-write by P to i? is 
an execution {C;np). 

Two configurations Ci = (si, . . . , s„, ri, . . . , r,„) and C2 = (s'l, . . . , s'„, , . . . , r^) are 
indistinguishable to process pi if Si — s- and r.j = r'j for 1 < j < n. If 5 is a set of 
processes, and for every process p e S, Ci and C2 are indistinguishable to p, then for 
any S'-only schedule cr, cr(d) and cr(C2) are indistinguishable to p. 

Our first lower lower bound relies on a lemma which is based on the following obser- 
vation. Suppose in configuration C there are three disjoint sets of processes Bq,Bi,B2, 
each covering a set R of registers, and go and qi are processes not in Bq U Bi U B2. 
Let CTi, i e {0, 1}, denote an arbitrarily long {^^j-only schedule. If, for i e {0, 1}, in the 
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execution {C; 713^(^1), qi does not write outside R, then the configurations nB,(Ji{C) and 
T^Bi-i<yi-i'^B,<yiiC) are indistinguishable to qi. Furthermore, after a subsequent third 
block write by ^2 all trace left inside of R can also be obliterated. Thus, the configura- 
tions Co = TTBo'^o'^Bi'yi'^BiiC) and Ci — TrsjO-iTrsQcroTrB^ (C) are indistinguishable to all 
processes, unless at least one of either qo or qi writes outside R. If, however, the solo 
executions by qo and qi both contain complete getTS ( ) calls, then one happens after the 
other and so processes have to be able to distinguish between Co and Ci . Hence, either 
qo or qi writes outside R in both of the executions {C;nB„cro'^Bi'^i) and {C;nBiCriTTBo'^o)- 
The same idea works if we replace qo and qi with disjoint sets of processes, as was 
done in the original version of this lemma due to Ellen, Fatourou, and Rupert [Ellen] 



et al. 2008 1. We state a simplified form here that suffices for our results and uses the 
form and notation of this paper. 

Lemma 2.1 ([ Ellen et al. 20081). Consider any timestamp implementation 
from registers that satisfies non-deterministic solo-termination and let C be a reach- 
able configuration. Let Bo, Bi, B2,Uo, Ui be disjoint sets of processes, where in C each 
of Bo, Bi, and B2 cover a set R of registers. Then there exists i e {0, 1} such that every 
Ui-only execution starting from Cj = t^b,{C) that contains a complete getTS method 
writes to some register not in R. 

Our second lower bound relies on a stronger lemma that is proved by inductively 



applying Lemma 2.1 



3. A SPACE LOWER BOUND FOR LONG-LIVED TIMESTAMPS 

We assume that a timestamp object is used in an algorithm where each process calls 
getTS infinitely many times. Actually, the number of getTS () calls can be bounded 
(by a function growing exponentially in n), but for convenience we pass on computing 
this bound. Ellen et al. used their lemma in order to inductively construct executions 
at the end of which k registers are covered by — k) processes, where k is bounded 

by 0{^/n). The lemma is used in the inductive step to show that in some execution fol- 
lowing a block-write, many of the non-covering processes can be forced to write outside 
the set of covered registers. By the pigeon hole principle, one additional (previously not 
covered) register can then be covered with many processes. With this idea, however, 
the number of processes covering one register is reduced by one in each inductive step, 
and thus it is not hard to see that the technique cannot lead to a lower bound beyond 

In our proof, rather than requiring that many processes cover the same register, we 
limit the number of processes covering the same register to three. In particular, we 
define a (3, fc) -configuration to be one where k processes are covering registers, but no 
register is covered by more than three of them. Usin g an argument reminiscent of that 
used by Burns and Lynch [Burns and Lynch 1993], we show that if there is an exe- 
cution that leads to some (3, fc) -configuration, we can find a (much longer) execution, 
during which at least two (3, fc) -configurations Ci and C2 are encountered that are 
similar in the sense that in both configurations each register is covered by the same 
number of processes. In addition, the execution (Ci; ct) that leads from Ci to C2 starts 
with three block-writes to the registers that are covered by three processes, each. We 



then apply Lemma 2.1 to see that we can insert a p-only schedule for some unused 
process p into the schedule a after one of the block-writes to get the new schedule a', 
such that at the end of the execution (Ci; ct') process p is poised to write outside of the 
registers that are 3-covered in Ci. Since the other two block-writes overwrite ps trace 
in (Ci; cr'), no process other than p can distinguish between o-'(Ci) and cr(Ci) = C2. It 
follows that in cr'(Ci) process p covers a register that is covered by at most 2 other pro- 
cesses. Hence, we have obtained a (3, fc + 1) -configuration. We can do this for fc < n/2, 
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so in the end we obtain a (3, [n/2j )-configuration. Clearly, this means that the number 
of registers is at least [n/6j . 

The signature of a configuration C, denoted sig(C), is a tuple (ci, C2, . . . , c,„) where 
every Ci is the number of processes covering the i-th register in C. The set of registers 
whose corresponding entry in sig(C) is equal to 3 is denoted TZziC). (In terms of sig- 
natures, a configuration C is a (3, fc) -configuration if sig(C) = (ci, C2, . . . , c^) satisfies 
= k and c, < 3 for every 1 < z < m.) Notice that in any (3, fc) -configuration there 
are at least [fc/3] registers covered. Configuration C is quiescent if in C no process has 
started but not finished executing a getTS () or compare () call. 

Lemma 3.1. Let P he an arbitrary set of processes. Suppose for every reach- 
able quiescent configuration D there exists a P-only schedule a such that (t{D) is 
a (3, k)-configuration. Then for any quiescent configuration D, there are two (3, fc)- 
configurations Co and Ci, and P-only schedules 70, 71, and -q such that: 

(a) . jo{D) = Co, 

(b) . 7i(Co) = Ci, 

(c) . sig(Co) = sig(Ci), and 

(d) . 71 = TTSgTrs^TTB^'?, where Bq, Bi and B2 are disjoint sets of processes each cover- 
ing UsiCo). 

Proof. We inductively define an infinite sequence of schedules 
Ao, So, Al, (5i, . . . , Al, Si, . . . and reachable (3, fc) -configurations Eo, Ei,E2, ■ ■ ., where 
Ei+i — \iSi{Ei), as follows. Eo is the (3, fc) -configuration (t{D) guaranteed by the 
hypothesis of the lemma. Let i3o,i,Si,i and B2,i be disjoint sets of processes each 
covering TZz{Ei). Execution (E'^; vr^Q ^ttb^ ^tt^^ .) consists of three consecutive block- 
writes to Tlj,{Ei) by the processes in _Bo,i, and B2,i, respectively. Schedule A^ is 
the concatenation of the sequence of permutations t^Bo.^''^Bi,^''^B2,, and some P-only 
schedule in which every process in P with a pending operation, finishes that 
pending operation. Thus, configuration \i{Ei) — ^^Bo,i^^Bl,,^^B2.^o:^{Ei) is quiescent. 
So by the hypothesis there exists a schedule Si such that Ei+i = \iSi{Ei) is again a 
(3, fc)-configuration. 

Since the set of signatures is finite, there are two indices j < fc, such that sig{Ej) = 
sig(_Efc). Fix two such indices j and fc. Let 70 = (tAo(5oAi(5iA2(52 . . . Aj_i(5j_i and 71 = XjS 
where S ~ S,j\j^xSj^Y . . . Xk-iSk-i- Furthermore, let Cq = 7o(-D) and Ci = 7i(Co). By 
definition, the configurations Co and Ci satisfy (a) and (b). Moreover, by construction 
Co = Ej and Ci ~ Ek and since 8ig{Ej) — sig{Ek), (c) is satisfied. Finally, let ry = ajS. 
Then, 71 = -Kg^ .tyb-^ .tyb.^ .rj, where Bqj, Bi^j, B2.J are disjoint sets of processes each 
covering 7I3 {Ej) = TZs (Co ) . This proves (d). □ 

Let Vk denote the set {pi, . . . ,pk} and Po denote the emptyset of processes. 

Lemma 3.2. For every < fc < [n/2\ and for every reachable quiescent configura- 
tion D, there exists a V2k-only schedule (Jk such that (Jk{D) is a (3, k)-configuration. 

Proof. The proof is by induction on fc. For fc = the claim is immediate by choosing 
(To to be the empty schedule. 

Let fc > 1, and let D be an arbitrary reachable quiescent configuration. By the in- 
duction hypothesis, for every reachable quiescent configuration C, there exists a V2k- 



only schedule <Jk-i, such that ak-i{C) is a (3, fc — 1) -configuration. Hence, by Lemma 3.1 
with P = 7'2fe-2 there are two reachable configurations Co and Ci, and P2/c-2-only 
schedules 70,71, and rj, such that 7o(I?) = Co, 7i(Co) = Ci, sig(Co) = sig(Ci), and 
71 = T^Bo'^Bi'^BiV^ where Bq,Bi and B2 are disjoint sets of processes, each covering 

7^3(Co). 
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Consider the two processes P2k-i and p2k- For i e {0,1}, let be a {p2fe-i}-only 
schedule such that in execution (ttb, (Co); a^)' P2k-i performs a complete getTSO in- 



stance. According to Lemma 2.1 there exists i e {0, 1}, such that P2k-i writes to some 
register not in TZsiCo) during the execution (ttb, (Cq); cti). (Note that whether i = or 
i — 1 depends on Cq.) Let r be the first register not in TZziCo) to which p2k-i writes to 
in (7rB^(Co); Q!i). Since sig(Co) = sig(Ci), we have r ^ ^^{Ci), and thus r is covered by 
at most two processes in Co as well as in Ci. 

Let A be the shortest prefix of ai such that P2k-i is about to write to r in 7rB^A(Co). 
Since P2k-i does not participate in schedule TTsj^TTs^Ty, it is also covering r in the con- 
figuration TTs- A7rB^_.7rB2?/(Co). Configurations TrBiT^Bi-i'^B2{Co) and 7rBj_.7rB.7rB2 (Co) 
are indistinguishable to all processes; therefore, irBiT^Bi.iT^BzViCo) — Ci. Moreover, 
since Ci = 7rBQ7rBi7rB2'7(Co) is indistinguishable from b^Xt^ Bi^.t^ B-iViCii) to every pro- 
cess except P2k-i, all processes other than p2k-i cover the same registers in Ci as in 
7rSiA7rBj_.7rB2?7(Co). Since P2k-i covers r in this configuration, and r is covered by at 
most 2 other processes, Tr^. ATr^^ ^7rB^7y(Co) is a (3, A:) -configuration. □ 



Lemma 3.2 shows that in any long-lived unbounded timestamp implementation 
that satisfies non-deterministic solo-termination there exists a reachable (3, \ n/2\)- 
configuration. Clearly, at least [n/6j > n/6 — 1 registers are covered in this configura- 
tion. This proves Theorem|l.l| 



4. A SPACE LOWER BOUND FOR ONE-SHOT TIMESTAMPS 

It seems natural to imagine that n registers would be required to construct a time- 
stamp system for n processes. But this is not the case for some restricted versions of 
the problem. For example, if the timestamps are not require d to come from a no where 
dense set, then, as shown by Ellen, Fatourou and Ruppert B Ellen et al. 2008| , n — 1 
registers suffice. We show that another instance is when each process is restricted to 
at most one call to the getTSO method. In this case Q{^Jn) registers are necessary 
and sufficient. This section contains the space lower bound. Section [6] contains the 
algorithm that shows that this lower bound is asymptotically tight. 



Our lower bound proof relies on Lemma |4.1[ the proof of which uses Lemma 2.1 



inductively. Given four disjoint sets of processes Bi, 82,63, U such that processes in 



^2, B3 cover a set of registers R, then, according to Lemma [2lT| for any partition of 
U into Vi and V2, either all the processes in Vi or all the processes in V2 can be made 
to cover some register outside of R. By choosing Vi and V2 to have sizes differing by 



at most one. Lemma 2.1 can be used to ensure that essentially half of the processes in 



Vi U V2 must write outside of R. 

We strengthen this idea by using Lemma 2.1 inductively to construct an execution 
such that all but one of the processes in U that have not initiated any operation can be 
made to cover some register outside of the set of registers R. Let participants(cr) denote 
the set of the processes taking steps in schedule a. A process is idle in configuration C 
if it is in its initial state in C; the set of all such processes is denoted idle(C). 

Lemma 4.1. 

Let C be a reachable configuration of a one-shot timestamp implementation from 
registers that satisfies non-deterministic solo-termination. Let Bq, Bi, B2, U be disjoint 
sets of processes where in C each of Bq, Bi and B2 cover a set R of registers and U C 
idlc(C), with \U\ > 2. Then there is a schedule j3a[5'a' satisfying: 

(a; = {7rBo,7ri3j; 

(b)In configuration P<7(3'(t'{C) all processes in participants((T) and participants(CT') cover a 

register outside of R; 
fcj participants((T) U participants(cr') C U. 
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(d) |participants(cr)| + | participants (o-')| = |?7| — 1; 
fej |participants(cr)| > [|J7| /2J > |participants((T')|; 

(f) a and a' are concatenations of solo schedules by distinct processes in U. 

Proof. Let U = {pq,. . . ,p„i}, where m > 1 (because \U\ > 2). For each 1 < k < m, 
we first inductively construct schedules and such that 

— participants ((5q) and participants((5^) form a partition of {po, . . -Pk}', 

— for i e {0, 1}, in execution (C; t:bM)'- 

— each process in participants(Jf ) initiates exactly one instance of getTSO; 

— exactly one getTSO method completes, and this getTSO is by the last process in 

— no process except possibly the last that occurs in 5^ writes outside of R; 

— for i e {0, 1}, in configuration ttb^5^{C) every process in participants ((5,^) except possi- 
bly the last that occurs in S'l cover a register outside of R. 

For i e {0, 1}, let be a pi-only schedule in which process pi performs a complete 
getTSO instance in the execution (C; ttb^S}). Such a schedule S} exists because pq and 
Pi are in idlc(C). This immediately satisfies the base case, k = 1. 

For i e {0, 1}, suppose that are constructed as required, and let qt denote the 
last process in (5f. Since execution (C;7rs,(5f) contains a complete ge tTS O by qi, and 
no process in before qi writes outside of R in (C;7rsi5f), Lemma 2.1 implies that 
either go in execution (ttbo {C);Sq) or qi in execution {ttbi (C); (5^ ) must write outside of 
R. Choose such a j € {0, 1} such that process qj does write outside of R in {ttb^ (C); 6^). 
First truncate the schedule , to, say, a*^, by deleting a suffix of the solo schedule of 
q-j SO that, instead of completing its getTSO method, q^ is paused at the earliest point 
such that at the end of the execution (tt^^ (C); a*^), qj covers a register outside of R. 
Now append to a^, a pfc+i-only schedule au+i so that the execution (ttb^ (C); aj^CTfe+i) 
contains a complete getTSO method by Pk+i- Define (5|+^ to be a'^a^+i and 5\'^^ to be 
The claimed construction now holds for fc + 1. 

Therefore, we can construct two schedules, S™ for i e {0, 1}, that together contain 
all the processes of U and where each is a concatenation of distinct solo-executions. 
Furthermore, each of the executions (C; -kb.^T) contains exactly one complete getTSO 
by the last process in the schedule J™, and no other process writes outside of R. There- 



fore, applying Lemma 2.1 one more time, for a j e {0, 1}, in the execution (C; ttb^Sj^), 
the last process in 5™ must write outside of R. Let aj be the schedule (5™ truncated 
to the first point such that at the end of execution {C\HBj(^j) this last process covers 
a register outside of R. Let cti_j be the schedule 5'^_j truncated to remove the entire 
schedule of its last process. 

Now relabel the members of {ttboCTo, ttsjCTi} to have distinct names in {j3a,(3'a'} in 
such a way that the two schedules ctq and cti are renamed with distinct names in {a, a'} 
and satisfy |participants((7)| > |participants((7')|. By construction, participants ((Tq) and 
participants(o-i) do not intersect; each is a subset of U; and together they contain all 
but 1 of the members of U. Also, by construction, each of ctq and (Ti are concatenations 
of solo executions. When combined with the relabeling, this establishes (a), (c), (d), (e) 
and (f). 

Since participants(o-o) and participants (cti) are disjoint sets, and since no process 
writes outside of R in the execution [C^TTB^cn) for i e {0,1}, and since each block 
write obliterates all writes to R, configurations ttb,{C) and 7rB^_.cri_i7rSi(C) are in- 
distinguishable to participants (ct^). So each process in participants((Ti) covers the same 
register in TiB^cniC) as it does in ■nBi_,(Ti-i''rB,(^i{C) and as it does in TTB,(^i'i^Bi-^cri^i{C). 
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Consequently, in both nBoO'o'^BiCri{C) and 7rB^o-i7rB„(To(C) each of the m - 1 processes 
in participants ((To) U participants (cti) covers a register not in R. This, combined with the 
relabeling, establishes (b). □ 



Lemma 4. 1 is the principle tool for our space lower bound for one-shot timestamps. 
To describe the structure of the proof we use the following definitions. Let m — [■\/2nJ . 
Assume that the set of all registers, denoted TZ, has size at most m since otherwise 
we are done. Define the ordered- signature of a configuration C, denoted ordSig(C), 
to be the m-tuple (si, S2, • • ■ , Sm) where Si > Sj+i, and there is a permutation a of 
{!,... ,m} such that for 1 < i < m, Si processes are covering the a^-th register. (The 
ordered-signature of a configuration is just its signature with the entries of the m- 
tuple reordered so that they are non-increasing. If only k < m registers exist then 
Sfc+i = Sfc+2 = . . . = s.,n = 0.) A configuration C with ordSig(C) = (si, . . . , s,„), is £- 
constrained i{ Sc < i — c for every 1 < c < ^. A configuration C is {j, k)-full if there is a 
set R of registers such that \R\ — j and in C each register in R is covered by at least k 
processes. If C is (j, /c)-full, TZj^k{C) denotes a set of such registers, otherwise TZj^k{C) 
is undefined. 

If C i s (j, fc)-full where fc > 3, and there are u > 2 processes that are idle in C, then 



Lemma 4.1 can be applied with Bq, Bi, B2 any 3 disjoint sets each covering TZj,k{C), so 
that for any \ <v <u-\,v processes can be made to cover registers outside of TZj^k{C) 
using at most 2 block writes to TZj^kiC). 

We use this idea repeatedly to construct an execution that visits a sequence of con- 
figurations, say Ci, Ciast such that the set of registers covered in d+i is a superset 
of the set covered in d until eventually a configuration Ciast is reached in which at 
least m — log n registers are covered. 

Intuition for our construction is aided by a geometric representation of configura- 
tions. Configuration C with ordSig(C) — (si, S2, • • • , Sm) is represented on a grid of cells 
where, in each column c, 1 < c < m, the lowest Sc cells are shaded. Thus each register 
corresponds to a column in the grid, but this correspondence can change for different 
configurations. With this interpretation, each shaded cell in column c represents a pro- 
cess covering the register corresponding to c. If the configuration is /-constrained, the 
shading in each column remains below the stepped diagonal that starts at height / - 1 
in the grid. The configuration is {j, fc)-full if in column j (and hence in all columns 1 
through j) the height of the shaded cells is at least fc. 

An overview of the construction is as follows. We first achieve an m-constrained 
(j, m — j)-full configuration for some j > 1 as shown in Figure [l] 

Given some ^^-constrained (j, £— j)-full configuration, (suc h as s hown in Figure[l]with 
m = £) and provided ^ — j is at least 3, we can apply Lemma [43] using 3 disjoint sets of 
processes each occupying cells in columns 1 through j for the sets Bq, Bi and B2. Then, 
one at a time, idle processes can be made to occupy cells in columns j + 1 through m. We 
will maintain the invariant that the number of idle processes is always greater than 
the number of unshaded cells that are under the stepped diagonal and in columns j + 1 
through m. Because of this invariant, we can be sure to reach a configuration C" where, 
for the first time, (when the columns j + 1 through m are rearranged in order of non- 
increasing number of occupants) some column j' > j + I gets £ — j' occupants. During 
this execution the block writes reduced the height of the shaded cells in columns 1 
through i by one or two. If only one block write happened during this execution, or if 
/ > j + 2, C is again an ^-constrained (j', ^ — j')-full configuration (Case 1 of Figure|2]l. 

The only other case is when both block writes were used to achieve C and / = j + 1 
(Case 2 of Figure|2]l. Then C" is an {£ — 1) -constrained (j' ,£ — 1 — /)-full configuration. 
In this case, however, at least half of the idle processes have moved to occupy cells 
in columns j + 1 through m. So this reduction by one in the stepped boundary of the 
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Fig. 1. Configuration Ci must have a column j that reaches to the diagonal. Hence there are j registers 
each covered with rn ~ j processes. 




R{C) R{C) R{C) R{C) 



Case 1 Case 2 

Fig. 2. After the block-write, processes are run until some new column j' reaches the diagonal and thus has 
height £ — j' . Case 1: columns 1 through j still have height at least i — j' . Case 2: the diagonal is reached at 
column j + 1 after two block writes. This can only happen if at least half of the unshaded space in columns 
j + 1 through m became shaded. 



grid can only happen log n times. Thus, each repetition of this construction creates a 
(m— s)-constrained {k, m — fc — s)-full configuration where s e 0(log7i). The construction 
can be repeated until either there are fewer than 2 idle processes or m — — s < 3. In 
both cases at least m - s = A/2n — 0(log n) registers are covered. 

The rest of this section contains the details of this construction, which provides the 
proof of Theorem |1.2[ We assume that n > 3 since otherwise the theorem is trivially 
correct. For configuration C, and a set of registers i? C 7?., poised(C, R) denotes the 
processes that are covering some register in R. For any set of registers R, R denotes 
the set n\R. 
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The construction is inductive, starting with the initial configuration Co- Initialize 

jo = 0, £o — m and i?o = 0- 

In Co, no registe r is covered. Set i?o = Si = i?2 = 0, let U be the set of all n processes 
and apply Lemma 4.1 Because iiBi is the empty schedule for i e {0, 1, 2}, the schedule 
produced is aa' and n — 1 processes cover some register in configuration o-a-'(Co). Let 
ordSig(tT(T'(Co)) = (si, S2, • ■ • , If s„i > 1, we are done since then m registers are 
covered. Therefore, assume s„i = 0, and suppose Sc < m — c — 1 for each c < to — 1. 
Then, J2T=i -^c < Y^T^i'i^ - c - 1) + s„i = (m — 1)(to — 2)/2 + < n — 1, which is 
impossible. Hence, there is at least one j < m - I satisfying sj > m — j and aa'{Co) is 
therefore (j, to — j) -full. Let 71 be the shortest prefix of aa' so that there is such a value, 
which we label ji, satisfying 71 (Co) is a {ji,m — ji)-full configuration. Configuration 
71 (Co) must also be TO-constrained, because otherwise, there is some index i such that 
i registers are covered by at least m — i + 1 processes in configuration 71 (Co). But then 
there is a proper prefix, a, of 71 such that a{Co) is a {i,m — z)-full configuration, for 
some i. Define Ci — 71 (Co), ii ~ m and i?i = TZj^j^-j^{Ci). 

In execution (Co; 71), each process p in participants(7i) leaves the set idle(Co) and 
performs a p-only execution until it is paused when it covers a register. Therefore, 
|poiscd(Ci, 7?.)| + |idlc(Ci)| = n. At most X]c=i(™^ 1) + 1 processes cover registers in 
The remainder of at least n— (X]cLi('^ ~ c — 1) + 1) > X]c=ji+i("* ^ processes are 
either still idle or are covering registers in So for i — I, the following construction 
invariant holds: 



(a) C, =7.(Q-i) 

(b) CR, _ 

(c) |poised(C„i?,OI + |idle(C,)| - 1 > Er=,.+i("^ " c) 

(d) > ji-i + 1 and e - 1} and < m 

(e) Ci is a (j,;,^i — j^'f^H configuration with = TZj^^i.^j.{Ci). 



Now suppose that a sequence of tuples (71, Ci, ji, ^1, i?i), . . . , {jkjCkdkiik^Rk) has 
been built so that the construction invariant holds for each. If ik ~ jk > 3 and 
|idlc(C/i;)| > 2 then let Bq,Bi,B2 to be disjoint sets of processes, such that each cov- 
ers Rk and each has size \Rk\, and let U = idle(Cfc). According to Lemma 4.1 there is a 
schedule Pa^'cr' satisfying: 



— p and /?' are block writes by i?o and Bi, 

— a and a' are concatenations of solo schedules by distinct processes in idle(Cfc), 

— |participants(cr)| > [|idlc(Cfc)|/2j 

— |participants(o')| + | participants ((t')| = |idle(Cfc)| — 1, and 

— in configuration /3a/3'a'{Ck) all processes in participants (cr) and participants (ct') cover 
a register in Rk. 



So, combining with (c) of the construction invariant, |poised(^cr/3'(T'(Cfe), i?A;)| > 
YlT=jt+i(''^ — c). Hence, there must be a non-empty set of registers Q' c R^ such that 
each is covered by at least £k — jk - \Q'\ processes. Let jk+i be the shortest prefix of 
I3a/3'a' such that there is such a Q', which we call Q, in 7fc+i(Cfe) and let Vk — \Q\, where 
i^k > 1- Define Ck+i = 7fe+i(Cfe), = Q U Rk, and jk+i ^ Vk + ik- Then |i?fc+i| = 

Vk + jk — ik+i- During execution {Ck;^k+i), processes that leave idlc(CA;) pause when 
they cover a register in Rk- So |poised(Cfc, i?fc)[+ |idle(Cfc)| = |poised(7fc+i(Cfc), i?A;)| + 
|idle(7fc+i(Cfc))|. Therefore, by (c), |poised(Cfc+i, i?fe)| + |idle(Cfe+i)| - 1 > Ecljfc+i("^"c). 
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Furthermore, since ^k+i is chosen to be as short as possible, 

|poised(Cfc+i,Q)| < (4-c-l) + l< i-m-c) 

Therefore, 

m jk+i^k m 

|poised(Cfe+i,i4\Q)| + |idle(Cfe+i)|-l > ^ (m-c)- ^ (to-c)= ^ (m-c) 

Thus (a), (b), and (c) of the construction invariant hold for k + 1. For parts (d) and (e) 
there are two cases. 

Case 1: ^k+i is a prefix of /3a or > 2. If jk+i is a prefix of /3a then there is only 
one block write to Rk- So in Ck+i, each of the jk registers in Rk remains covered by 
at least £k — j/t — 1 processes and each of the i^k registers in Q is covered by at least 
ik — jk — Vk < tk — jk — ^ processes. If Vk > 2, then in Ck+i, each of the jk registers in 
Rk remains covered by at least £k - jk - 2 processes and each of the Uk registers in Q 
is covered by at least £k — jk — Vk < £k — jk — 2 processes. So in either situation, each 
of the jk+i = jk + Vk registers in Rk+i is covered by at least 4 - jk - Vk = £k - jk+i 
processes. Therefore, setting ik+i = £k we have that Ck+i is a {jk+i,£k+i — :;fe+i)-full 
configuration and the construction invariant holds. 

Case 2: Vk — I and ^k+i is not a prefix of /3a. In this case there are two block writes 
to Rk- So in ^k+i [Ck], each register in Rk remains covered by only £k —jk — 2 processes, 
which is one fewer than the number of processes covering the single register in Q. Since 
jk+i = jk + 1, we can set 4+i = 4 - 1 to ensure that Ck+i is a Ofc+i,4+i - jfc+i)-full 
configuration. So, again the construction invariant holds. 

This construction ends in a configuration Ciast where either ^last — jiast < 2 or 



|idle(Ciast)| = 1, since in either case Lemma 4.1 can no longer be applied. Clearly, 
Ciast = 71,72, ■•• ,7iast(Co) SO Clast is reachableT^Since, in Ciast-i, every register in 
i?iast-i was covered by at least 3 processes, every process in i?iast is covered by at least 
one process. So it only remains to bound |i?iast| = jiast from below. 

First we show that |idle(Ciast)| < 1 is not possible. Intuitively, this is because dur- 
ing execution (Co; 71, 72, ■ • ■ , 7iast) processes pause in such a way that each of the con- 
structed configurations Ci , . . . , Ciast is m-constrained, which does not allow enough 
room to use n — 1 processes. 

To make this precise, let 7 denote 7172 . . .7iast, and say that process p is associ- 
ated with register r if r is the last register that p covers during execution (Co; 7). 
During the execution (Co; 7), processes no longer become associated with a regis- 
ter r after r becomes a member of Ri for some i. Let f{r) be the smallest step 
number, i, such that r e Ri (and f{r) = last otherwise). Also, for each register 
r, let g{r) = \{p \ p is associated with r} | . We must have g{r) = poised(C/(r), {r}). 
If |idle(Ciast)| < 1, then each of n - 1 processes is associated with a register. So 
n — 1 < J2ti 5(^) — J2tz poised(C/(r), {r}). But by construction, each d is frconstrained 
and therefore m-constrained. Thus Poised(C/(r), {r}) < X]cli("^ ~ c). But then 
n — 1 < X]c=i("^ ^ ^) =^{771— l)/2, which can hold only if to > \/2n (since n > 3). 

We can therefore conclude that the construction must have ended because £iast — 
iiast < 2. So, now we show that if ^last — Jiast < 2 then jiast is at least to — 
logn — 2. Let S be the number of times that Case 2 occurred in the creation of 
(7i,Ci,ji,4,i?i), . . . , (7iast, Clast, .?'iast,^iast,-Riast)- Because £q = m and £, decreases only 
for this case and only by one each time, ^last = m — 5. Consider a step i where 
Case 2 occurs, with 7^ = /3a/3'a'. By Lemma |4.1[ |participants(CT)| > [|idle(Cfe)|/2j 
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so I participants (crcr') I > [|idle(Cfe)|/2]. Since idle(Co) = n and idle(Ci) < idle(Ci+i) it 
follows that Case 2 can occur at most \ogn times. Consequently, S < logn implying 
Aast > TO - log n. Hence, jiast > ^last - 2 > TO - log n - 2. 



This completes the proof of Theorem 1.2 



5. A SIMPLE ONE-SHOT TIMESTAMPS IMPLEMENTATION USING ln/2'\ REGISTERS 



Algorithms [T] and 
ters and thus bea 



2^ implement one-shot timestamps for n processes using [n/2] regis- 
rthe space used by any register implementation of long-lived times- 
tamps. It is of interest only because of its simplicity; in Section [6} we improve on this 
space complexity with a more complicated algorithm, which shows that the space lower 
bound of Section |4] is asymptotically tight. 

The simple-getTSO method by process p reads each of the registers in sequence, 
updates the value of the register that p shares by adding one to what p read, and 
returns as p's timestamp the sum of all the values read. The simple-compare (ii, 
method returns the truth value of ti < t2. 

ALGORITHM 1: simple-compare(ti,t2) 
return ti < t2; 



ALGORITHM 2: simple-getTSO 

// R[l...\n/2]] is a shared array of multi-reader/2-writer registers each with a 
value in {0,1,2} and initialized to 0. Register R[i] is written by processes 2i 
and 2i + l. 

II sum is a local variable 

sum := ; 

for 1 = 1... [n/2] do 
if i = [p/2] then 

I R[i\ ■- + 1; 
end 

sum := sum + 
end 

return sum; 



Lemma 5.1. Algorithms^and^constitute a waitfree implementation of one-shot 
timestamps for an asynchronous system of n processes. 

Proof. Clearly both methods simple-compare and simple-getTS are waitfree. Let 
p and q be two processors that perform a simple-getTS method call and let tp and 
tq be their corresponding timestamps. Assume that p.simple-getTSO happens before 
simple-getTSO. Each process writes either 1 or 2 to its register and only writes 2 if 
it observed that its register already held 1. Because it is one-shot, any such observed 
1, must have been written by the observing process' partner, and thus the value in 
each register never decreases. Consequently, the value of sum also never decreases 
so tp < tq. Since p. simple-getTSO happens before (/.simple-getTS, qs sum will also 
account for the additional 1 that q writes to its own register and that is not observed 
by p. Therefore tp <tq. □ 

6. AN ASYMPTOTICALLY TIGHT SPACE UPPER BOUND FOR ONE-SHOT TIMESTAMPS 

We now present a waitfree algorithm for any timestamp system that invokes at most 
M getTS method calls, which uses [2\/M] registers. In particular, the algorithm uses 
\2y ^ re gisters for an n-process one-shot timestamp system, thus establishing Theo- 
rem [L3] and showing that the space lower bound of Section |4] is asymptotically tight. 
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Timestamps are ordered pairs {rnd,turn) e N x (N U {0}). The compare method 
simply compares timestamps lexicographically without accessing shared memory (see 
Algorithm |3]l. 

ALGORITHM 3: compare((rndi, tiirrii), {rnd2,turn2)} 
1 return {rndi < rnd2) V I {rndi — rnd2) A {turrii < turn2) I 



6.1. The getTS algorithm 

AlgorithmElprovides the getTS method. It uses the parameter m, the number of shared 
registers, which is a function m — f{M), where M is the maximum number of getTS 
method calls. We will prove that f{M) ~ \2\/M~\ suffices. Each process numbers its 
own getTS method calls sequentially. The fc-th time that p invokes getTS, it does 
so using ID = p.k. We refer to these IDs as getTS-ids. When specialized to one-shot 
timestamps, ID can be just the invoking process' identifier 

ALGORITHM 4: getTS(ID) 

/* For the fc-th invocation by process p, ID — p.k. */ 
Shared: 

R[l . . . m]: array of multi- writer multi-reader registers, initialized to _L; 
Local: 

r[l . . . m] initialized to _L; 
j initialized to 1; 
myrnd; 
while R[j] / _L do 

j = j + 1 
end 

4 myrnd ~ j ^ 1 

5 for j = 1 . . . myrnd — 1 do 

6 if R[myrnd + 1] =— _L then 

7 it r[myrnd].seq[j] =— last{R[j].seq) then 
R[j] = {(ID), myrnd); 

9 return (myrnd, j) 

10 else if -R[j]. rnd < myrnd then 

11 I = ((ID), myrnd); 
end 

else 

I return (myrnd +1,0) 
end 
end 

13 r[l . . . m] = scan(R[l],. . . ,R[m]) 

14 if r[myrnd + 1] == _L then 

15 I R[myrnd + 1] = {{last{r[l].seq), . . . ,last{r[myrnd].seq), ID), myrnd + 1) 
end 

16 return {myrnd +1,0) 

The shared data structure used in the getTS () method is an array of m multi-writer 
multi-reader atomic registers. The content of each register is either + (the initial value) 
or an ordered pair {seq, rnd) where, seq is a sequence of getTS-ids, and rnd is a positive 
integer. The algorithm maintains the invariant that for some integer fc > the first k 
registers are non-+ and all other registers are +. Moreover, the sequence R[j].seq for 
j < k has length either 1 or j. The i-th element of seq is denoted seq[i], and last{R[j].seq) 
is the last element of the sequence R[j].seq. 
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The algorithm uses the well-known obstruction-free s can method due to Afek, At- 
tiya, Dolev, Gafni, Merritt and Shavit [Afek et al. 1 9931, which returns a successful- 
double -collect. A collect reads each . . . , R[m\ successively and returns the resulting 
view. A successful-double-collect(i?[l], . . . , R[m\) repeatedly collects until two contigu- 
ous views are identical. The scan can be linearized at any point between its last two 
collects. Although this scan is not wait-free in general, the use of it by Algorithm |4] 
is. This is because, in any execution, each getTSO performs at most m — 1 writes, so 
each scan operation will be successful after a finite number of collects. Since scan is 
linearizable, we treat it as atomic for the remainder of this section. 

The idea of the algorithm is as follows. An execution proceeds in phases. During 
phase k, R[l] through R[k — 1] are non-_L; R[k + 1] to R[m] are _L; R[k] is either written 
or some getTSO is poised to write to it for the first time. Every write to R[k] during 
phase fc is a pair {seq, rnd), which stores a sequence of k getTS-ids in seq. We say that 
register R[i] is valid if the phase is k and the last entry stored in R[i].seq equals the 
i-th entry stored in R[k].seq. 

Roughly speaking, phase fc — 1 ends when some getTS(q') method discovers that 
all registers R[l] through R[k — 1] are invalid. Then getTSCg) performs a scan, which 
returns the view (ri, . . . , rfe_i, _L, . . . , _L). The fc-th phase starts precisely at this scan. 
Then getTSCg) prepares to write the sequence to R[k].seq, where £j = 

last{ri.seq) for 1 < i < fc — 1. First imagine, for simplicity, that getTS((7)'s scan and 
subsequent write to R[k] occur in one atomic operation. In that case, at the beginning 
of the fc-th phase, every register R[i], 1 < i < fc — 1, is valid. Because getTSCq") started 
phase fc, it returns the timestamp (fc, 0). 

For the rest of phase fc, any other getTS(p) method that began in phase fc examines 
the registers R[l] through R[k — 1] in this order looking for the first register that is 
valid. If it finds one, say R[i], it writes {p, fc) to R[i], thus invalidating R[i] by making 
last{R[i].seq) differ from the i-th entry stored in R[k].seq, and returns the timestamp 
(fc,j). If it fails to find one, it will perform a scan and prepare to start phase fc + 1. 
Observe that this algorithm works correctly if all getTSO calls are sequential: the 
getTSO that starts phase fc returns (fc,0) and the j-ih. getTSO call after that, for 
1 < J < ^ — 1, invalidates R[i] and returns (fc, j). 

There are several complications and subtleties that arise due to concurrent getTSO 
executions. Suppose a getTS ( ) that began in phase fc sleeps before it writes its invali- 
dation to a register R[i]. If it wakes up during some later phase fc', its write could in- 
validate R[i] for phase fc' making timestamp (fc', i) unusable, and so increase the space 
requirements. Such damage is confined to at most one such wasted timestamp per 
getTSO method as follows. Each getTS (p) begins by setting its local variable, myrndp, 
to the largest value such that R[myTndp] is non-_L. Before each of its writes, getTS (p) 
checks that R[myrndp + 1] is still non-_L. If it is not, the phase must have advanced, 
and getTS (p) can safely terminate with timestamp {myrndp + 1, 0). 

A more serious potential problem due to concurrency occurs when our simplifying 
assumption above (that the scan and subsequent write occur in one atomic operation) 
does not hold. Suppose at the end of phase fc — 1 , both getTS (p ) and getTS {q ) are poised 
to execute a scan and then write the result into R[k]. If, after getTS(p)'s scan and 
before getTS((7)'s scan, some "old" writes happen to some registers say . . . , R[i], 
the results of their scans will differ. After both scans, getTS((7)'s view matches all 
register values, but getTS (p)'s view matches only the contents of R[j + 1],. . . , R[k — 
1]. Now let both getTS (p) and getTS ((7) proceed until both are poised to write the 
result computed from their view to R[k], and suppose getTS(p) writes first. At this 
point, registers . . . , R[j] are already invalid because of getTS (p)'s out-of-date view. 
Another getTS (a) starting at this point will invalidate R[i + 1] and return timestamp 
(fc, j + 1). If after that, getTSCq-) writes to R[k], the first j registers could become valid. 
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and getTS(6) beginning after getTS(a) completes would invalidate R[l] and return 
timestamp (fc, 1), which is incorrect because it is less than getTS(a)'s timestamp. This 
problem is eliminated by ensuring that when getTS(a) determines that a register R[i] 
is invalid, it will remain invalid for the duration of the phase. One way to achieve this 
is to have getTS (a) overwrite the invalid register R[i] with (a, myrnda) before it moves 
on to investigates the validity of R[i + 1]. This simple repair to correctness, however, 
can increase space complexity Instead, the overwriting by getTS (a) is done only when 
necessary. Specifically, getTS (a) determines that a register R[i] is invalid by reading a 
pair {seqi, rndi) from R[i] and finding that last{seqi) is not equal to its view of the i-th 
value in R[k].seq. If rndi = k then this invalidation cannot be due to an old write from 
an earlier phase, so no overwriting is needed. In the algorithm, getTS (a) overwrites 
register R[i] with (a, k) only when it read rndi < k. 

As we shall see, these additional techniques are enough to convert the idea of a time- 
stamp object that is correct under sequential accesses to an algorithm for concurrent 



timesta mps th at is correct (Lemma 6.4 1 and space efficient (Lemma |6.5| l and waitfree 
(Lemma 6.14| l. These three lemmas, when specialized to the one-shot case, constitute 



the proof of Theorem |1.3| 
6.2. Algorithms|3]and|4]Correctly Implement Timestamps 

We isolate some properties of Algorithm |4] that will serve to simplify both the correct- 
ness and complexity arguments. In the rollowing, the local variable x in the code of 
Algorithm [4] is denoted by x^d when it is used in the method call of getTS (id) . 

Claim 6.1. In any execution 

(a) once the content of a shared register becomes non-1, it remains non-L; and 
(h) For any j, 1 < j < m, the value oflast(R[j].seq) that is written by each write to R[j] 
is distinct. 

In any configuration of an execution 

(c) if any getTS (ic?) has returned {rnd,turn) then R[rnd] ^ L; and 

(d) ifR[k] ^ _L then Vfe' < k, R[k'] ^ _L. 

Proof. 

(a) No getTS method call ever writes _L to any shared register 

(b) The only writes to a shared register occur at lines 8, 11 and 15. In any single 
instance of getTS, say getTS (id), in each iteration j of the for-loop (line 5), for 
1 < j < myrndid — 1, at most one write occurs, either at line 8 or 11 but not 
both, and any such write is to R[j]. If getTS (id) writes at line 15, it writes to 
R[myrndid + 1]. So, in any single execution of getTS (id), each register is written 
at most once. Every write by getTS (tc?) to a register R[j] sets last{R[j].seq) to id, 
which is distinct for each getTS method call. 

(c) getTS (id) returned at line 9, 12 or 16. We show that in all cases the register R[rnd] 
was written before getTS (tc?) returned. Then the claim follows by (a). If getTS (ic?) 
returned in line 9 then rnd ~ myrndid, and R[myrndid] ^ -L when the while-loop 
of getTS (id) completes. If getTS (id) returned in line 12, then rnd = myrndid + 1- 
Before returning, however, getTS (id) evaluated the if-statement in line 6 to be 
false, implying R[myrndid + 1] 7^ -L. If getTS (id) returned in line 16, then rnd = 
myrndid + 1 and either getTS (id) evaluated the if-statement in line 14 to be false, 
or getTS (id) wrote to R[myrndid + 1] in line 15. In either case, R[myrndid + 1] 7^ -L 
before getTS (id) returned. 

(d) Consider any write to a register R[k] and suppose it occurs in the execution of 
getTS (id). The while-loop of getTS (id) confirms that all registers R[\\ through 
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R[myrndid] were previously non-_L, before any write by getTS (id) . Writes only oc- 
cur in lines 8, 11 and 15 of getTS. Every write by getTS (id) in lines 8 and 11 
is to some register R[j] where j < myrndid- A write in line 15 by getTS (ic?) is 
to R[m,yrndid + 1]. So in all cases, when the write to R[k] occurred, registers R[V\ 
through R[k — 1] were previously non-_L. The claim follows by (a). 



□ 



Definition 6.2. A getTS () method fails in iteration j in line 6 if, in its j-th iteration 
of the for-loop (line 5), the if-condition in line 6 returns false; it fails in iteration j in 
line 7 if, in its j-th iteration of the for-loop, the if-condition in line 7 returns false; and 
it fails in iteration j, if either it fails in iteration j in line 6 or it fails in iteration j in 
hne 7. 

Claim 6.3. Ifmyrndp > myrndq for two method calls getTS(p) and getTSCg), and 
getTS (p) writes to R{j] before the j-th iteration of the for-Loop of getTS (g) begins, then 
getTS ((7) fails in iteration j. 

Proof. R[myrndp] 7^ -L wh en getTS (p) executed line 1 of its while-loop for j = 
myrndp, and thus by Claim 6.1 a) remains no n-J. after the while-loop completes. 



First, suppose myrndp > myrndq. By Claim 6.1 ^a) and (d), R[myrndq + 1] 7^ _L when 



getTS ((7) executes its j-th. iteration of the for-loop. So the if-condition in line 6 of 
getTS ((7) returns false, and getTS ((7) fails at iteration j. 

Now, suppose myrndp — myrndq. getTS (p) wrote to R[j] after executing its while- 
loop and therefore after R[myrndq] became non-_L. The content of r[myrndq\q, which 
q read from R[niyrndq], came from the value of a scan taken during the execution 
of some getTS when R[myrndq] = _L. Hence when getTS (g) executes line 7 of the j- 
th iteration of the for-loop, it compares the value of r[myrndq]q.seq[j], which is the 
value of last{R[j].seq) that R[j] had before R[j] was writ ten b y getTS (p), to a value of 
last{R[j].seq) that R[j] had after this write. So by Claim [Of b), this comparison must 
return false, and getTS (g) fails at iteration j. □ 

Lemma 6.4. Provided m — f{M) is large enough, Algorithms^and^implement a 
timestamp object for any asynchronous shared memory system of processes that invokes 
getTS a total of at most M times. 

Proof. Let getTSCp) and getTSCg) return timestamps {rndp,turnp) and 
{rndq,turnq) respectively. We need to show that if getTSCp) happens before getTSCg), 
then compare( (rndp, turrip), {rndq,turnq)) returns true. That is, we need to show that 

(rndp < rndq) V ({rndp = rndq) A {tump < turnq)^. 



By Claim 6.1(c), after getTSCp) completes, R[rndp] ^ _L. Therefore by Claim 

[djC a) and iaJTRil], . ■ ■ , R[rndp] ^ L throughout the method call getTS Cg). Thus, 
at line 4 after the while-loop, getTS Cg) sets myrndq > rndp. If getTS Cg) returns 
at line 12 or 16, rndq = myrndq + 1 > rndp + 1 impl3dng {rndp < rndq) so 
coimpare{{rndp, tump), {rndq, turnq) returns true as required. If getTSCg) returns at 
line 9, and myrndq > rndp, then again conipare{{rndp, tump), {rndq, turnq) returns 
true. Therefore, suppose that getTS Cg) returns at line 9 and rndq — myrndq = rndp. 
In this case, turnq is non-zero. If p returns at line 12 or 16, tump is zero. So again 
coimpare{{r ndp, tump), {rndq, turnq) returns true. 

The only remaining case is that both getTSCp) and getTS Cg) return at line 9 and 
rndq — myrndq — rndp = myrndp. In this case, we show that turnq >turnp by proving 



that getTS Cg) fails at every iteration 1 through tump. By Lemma 6.3 it suffices to 
show that for every j, I < j < tump, there is some getTSCp'), with myrndp' > myrndq 
that writes to R[j] before getTS Cg) begins iteration j. For getTSCp), the if-condition in 
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line 7 must have returned false for all iterations 1 through turup — 1, and then returned 
true in iterations turup. For j < turrip, when getTS(p) fails at iteration j, it reads 
R[i] (line 10). If this read shows R[j].rnd > myrndp there must be a getTS(p'), with 
myrnd'p > myrndp that wrote this. If the read shows R[j].rnd < myrndp, then getTS(p) 
itself writes to R[j] changing R[j].rnd to myrndp (line 11). For j — tump process p itself 
writes into register R[j]. In all cases, the write happened before getTS iq). □ 

6.3. Space Complexity of Algorithm |4] 

Fix an arbitrary execution E that contains at most M getTS () invocations. In this 
subsection we prove no register beyond R[ 2\fM ] is accessed in E. 

The proof proceeds as follows. We partition E into phases. Phase starts at the 
beginning of E. Phase (p > I starts at the point of the first scan (line 13) by some 
getTS (p), for which myrndp = ip-l. We say that phase (p completes during E, if phase 

+ 1 starts durin g E . Call the first write to R[j] during any phase an invalidation 
write. First, Claim 6.8 sho ws th at only registers R[l] through R[ip\ can be written dur- 
ing phase (p. Next, Claim 6.10 establishes that if phase ip completes then it contains 
exactly invalidation writes. Finally, we define a charging mechanism that charges 
each invalidation write in E to some write in E in such a way that there are at most 



2 charges to all the writes of any one getTS ( ) invocation. This gives us Claim |6.13 
which states that there are a total of at most 2M invalidation writes 



Therefore, the total number of phases, $, satisfies: X]ip=i ^ ^ 2A/. Hence, $ < 2vAf. 
The algorithm uses a final sentinel register that is read but never written, and always 
contains the initial va lue -L. So at most [2 • \/Ml registers are accessed in E. Therefore, 
once Claims 6^ 6.10 and |6.13| are proved (below) we have the following: 



Lemma 6.5. Algorithm^uses at most \2\fM~\ registers for M getTSO operations. 



Technical claims 

The proof of Lemma 6.5 via Claims 6.8 6.10 and 6.13 is the most challenging part of 
our arguments concerning Algorithm|4| First, we encapsulate the relationship between 
the value of the variable myrndp of a getTS (p) method call and the phase number ip 
during which getTS (p) writes to certain registers. 

Claim 6.6. 

(a) If getTS ip) writes to register R[i] when R[i + 1] = _L, then that write occurs in line 15. 

(b) getTS (p) executes line 15 during some phase p > myrndp + 1. 

(c) getTS (p) executes line 4 during some phase ip' > myrndp. 

Proof, (a) If w is a write by getTS (p) to R[i] in line 8 or 11, then i < myrndp — 
1. Wh en g etTS (p) previously read R[myrndp] in line 2, its value was non-_L, so, by 
Claim 6.1 a) and (d) R[i + 1] is non-_L when w occurred. Hence, any write to R[i] when 
R[i + i\ ~ _L could not have occurred at line line 8 or 11, and thus could only occur at 
line 15. 

(b) When getTS (p) executes line 15, it has already executed its scan in line 13. By 
definition of phase, if phase myrndp+l had not already begun before this scan occurred, 
then it began with this scan. 

(c) Before getTS (p) executes line 4, its while-loop terminated because R[myrndp] ^ 
_L and R[myrndp + 1] = -L. By (a), R[myrndp] must have previously changed from _L 
to non-_L, when some getTS (r) executed line 15. When getTS (r) executes this write, it 
wrote to R[myrndr + 1], so myrndr + 1 = myrndp. Before this write, getTS (r) executed a 
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scan at line 13, which either started phase niyrndr + 1, or phase myrndr + 1 had already 
started. Thus, myrndr + 1 = myrndp started before getTSCp) executed line 4. □ 

Claim 6.7. If during phase myrndp, getTSCp) fails iteration i at line 7, then during 
phase myrndp and before the failure, there was a write to R[i] and a write to R[myrndp]. 

Proof. Let v ~ (ui, ... ,vk) he the value of the sequence stored in R[myrndp].seq 
when getTSCp) reads that register in line 2. Let getTS(6) be the method call that 
wrote V to R[myrndp].seq. The while-loop of getTS(p) terminated when getTS(p) read 
R[myrndp + 1] = _L in the {myrndp + l)-th iteration of it s while-loop after read- 



ing R[myrndp] 7^ _L in its previous iteration. By Claim 6.1(a), R[myrndp + 1] = _L 
when getTS(6) wrote v to R[myrndp]. Therefore, by Claim 6^6 (a), getTS(6) wrote to 
R[myrndp] in line 15 and thus myrndt = myrndp — 1. By Claim 6i6 (b), getTS (6)'s write 
to R[myrndp] happens during phase myrndp or a later phase. Because this write hap- 
pens before getTS(p) reads R[i] in line 7 during phase myrndp, getTS(6)'s write to 
R[myrndp] occurs during phase myrndp before getTS (p) fails at iteration i. 

Before executing line 15, getTS (b) executed a scan in line 13, and obtained Vi for the 
value of last{R[i].seq). By the assumption that getTS(p) fails at iteration i in line 7, 
Vi ^ last{R[i].seq) when getTSCp) reads R[i] in line 7. Therefore, there must have 
been a write to R[i] between getTS(6)'s scan and getTS (p)'s read at line 7. This write 
must have occurred in phase myrndp because, by the definition of phase, either phase 
myrndp began before getTS(6)'s scan or it began at this scan. Thus, a write to R[i] 
occurs in phase myrndp before getTS (p) fails iteration i. □ 

Counting invalidation writes per completed phase 

Our goal is to show that during phase (p there is exactly one invalidation write to each 
register R[l] through R[(p], and no other registers are written. 

Claim 6.8. Only registers R[l], . . . , R[ip] are written during phase (p. 

Proof. Until some getTS () has executed line 15, and thus phase 1 has started, 
no register is written. Hence, the claim is trivially tr ue fo r ip = 0. Now let p > 1. l{ 
getTS (g) writes to R[j] in lines 8 or 11, then by Claim [6^( c), ip > myrndq, and by the 
semantics of the f or-loop, j < myrndq. If q writes to R[j] in line 15, then j — myrndq + 1 



and by Claim 6.6 (b), (p > myrndq + 1. Hence, in either case j < (p. □ 



Claim 6.9. If phase p completes in E, then for each I < j < p, there is at least one 
write to R[j] during phase p. 

Proof. By definition, phase p + 1 > 1 begins at the first scan (line 13) by some 
getTS (p), for which myrndp = p. Since getTS (p) executes line 13, its call does not 
return during the for-loop. Therefore, this scan can happen on ly if this getTS (p) fails 



in iteration j at line 7 for every I < j < p — I. Thus, by Claim 6.7 a write to register 



R[p] and a write to R[j] for every I < j < p — I happens in phase p. □ 

Claim 6.10. There are exactly p invalidation writes in any completed phase p. 



Proof. By Claims 6.8 and 6.9 exactly the registers R[l] through R[p] are written 
during phase p. The set of first writes to each of these registers during phase p is, by 
definition, the set of invalidation writes in phase p. □ 

Counting all invalidation writes 

We re ly on some definitions and factor out some sub-claims to facilitate the proof of 
Claim irHlbelow 
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Claim 6.11. If a write at line 11 by getTS(p) happens during phase myrndp, then 
that write is not an invalidation write. 

Proof. Let w be a write at line 11 by getTSCp) to register R[j] that occurs during 
phase myrndp. T hen w happens only if getTS(P) fails at iteration j at line 7. So, ac- 



cording to Claim 6.7 there was a previous write to R[j] during phase myrndp. Hence, 
w is not an invalidation write. □ 

There can be an interval between when the first getTS (q) with myrndq = ip—1 does a 
scan at line 13 thus starting phase ip, and when the first write to R.[ip] happens, which is 
the first point at which other getTS () method calls can discern that the current phase 
is (at least) ip. To capture this, say that the phase (p is invisible from the beginning of 
phase ip to the step before the first write to R[ip] and visible from the first write to R[ip] 
to the end of phase ip. 

Claim 6.12. Any write by getTS (p) at line 8 or at line 15 or any write at line 11 
that happens after the phase myrndp + 1 becomes visible, is getTS (/j)'s last write. 

Proof. Algorithm [4] returns after any write at line 8 or line 15, so such a write is 
the last write of the method call. Now consider a line 11 write w by getTS (p). If w 
happens anytime after phase myrndp + 1 becomes visible, then after w, getTS (p) will 
discern that the phase number has increased when it reads R[myrndp + 1] to be non-_L 
either at line 6 in the next iteration of the for-loop or, if the for-loop is complete, at 
line 14. In either case getTS (p) returns without another write, so such a write is also 
the last write of the method call. □ 

Claim 6.13. There are at most 2M invalidation writes in execution E. 

Proof. Define: 

A = {u; I ui is a first invalidation write by some getTS () method} 

B = {w \ w is the last write by some getTS method and w is an invalidation write} 

C ~ {w \ w is the last write by some getTS method and w is not an invalidation write} 

Let W* be the disjoint union of A,B and C. Since each getTS () can have at most 
one write in A and at most one write in either _B or C but not both it follows that 
\W* I < 2M. Let W be the set of all writes, and let / be the set of all invalidation writes 
during execution E. We will define a function / : / — > VK. Then, it will suffice to show 
that / is injective with range a subset of W* . 
Define: 

Ii = {w \ w is an invalidation write at line 8 or 15} 

h = {w \3 getTS (p) satisfying {w is an invalidation write at line 11 by getTS (p)) and 
iw happens after the beginning of the visible part of phase myrndp + 1) } 

13 = {w \ 3 getTS (p) satisfying {w is an invalidation write at line 11 by getTS (p)) and 

(w happens during the invisible part of phase myrndp + 1 ) and 
iw is getTS (p)'s first invalidation write)} 

14 — {w \3 getTS (p) satisfying {w is an invalidation write at line 11 by getTS (p)) and 

(w happens during the invisible part of phase myrndp + 1 ) and 
{w is not getTS (/j)'s first invalidation write)} 

Let w be a write by getTS (p). Then w happens at line 8, or line 11 or line 1 5, and , by 
Claim |6.6|(c), w happens in some phase ip satisfying tp > myrndp. By Claim |6.11[ if w 



is a write at line 11 that occurs during phase myrndp, then w is not an invalidation 
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write. Hence Ii Li I2 ^ h ^ h = L Also, clearly Ii, 12,13 and I4 are mutually disjoint. 
Therefore {Ii, I2, 13, h} is a partition of /. 



For all w e h U I2 U I3 define f{w) = w. By definition, I3 C A, and by Claim 6.12 
/i U /2 C B. So, / maps /i U /2 U I3 to the disjoint union of A and B and clearly, / is 
injective on /i U /2 U /a. 

It remains to map to C and show this map is injective. Let whe a write in h by 
getTSCp) to register R[i]. By definition of /4, w occurs during the invisible part of phase 
myrndp + 1, an d the re is ano the r inva lidation write, say u, by getTSCp) that precedes 
w in E. Claims 6.6 (c), [6.11| and [6.12| imply that m is a line 11 write that also occurs 



during the invisible part of phase myrndp + 1. 

In line 10, before executing w, getTS(p) reads a value x < myrndp from R[i].rnd. Let 
o be this read operation. Operation o occurs after u and before w and thus also during 
the invisible part of phase myrn dp + 1. Define f{w) to be the write operation that wrote 
the value to R[i] that was read by o. 

We now show that f{w) is in C. Let getTS(a) be the method call that starts phase 
myrndp + 1 by executing a scan in line 13. Then myrnda — myrndp, and during phase 
myrndp, getTS(a) fails at iteration j at line 7 and thus executes line 10, for all j = 
1, . . . , myrndp — 1. In particular for j = i, getTS(a) either writes the value myrnda = 
myrndp > x to R[i].rnd in line 11, or in line 10 it reads R[i].rnd > myrnda > x. Hence, 
f{w), which writes the value x to R[i].rnd that is read by o, must happen after the 
i-th iteration of getTS(a)'s for-loop and before o. Furthermore, f{w) cannot happen 
during phase myrndp + 1, because otherwise w would not be an invalidation write. We 
conclude that f{w) is a write to R[i] th at ha ppened in phase myrnda after getTS(a) 
failed at iteration i, and hence, by Claim [6?7| f{w) is not an invalidation write. 

Let getTS(6) be the method call that executes f{w). When getTS(a) finished its 
while-loop, R[myrnda] — R[myrndp] 7^ _L. Hence, by Claim [6rT](a), R[myrndp] ^ _L when 



/(w) occurs. Since myrndp — x < myrndp, by Claim 6.1 (a) and (d), R[myrndb + 1] 7^ -L 
when getTS (b ) executes either the if-statement in line 6 (in iteration i + 1 of its for- 
loop) or in line 14 (if its for-loop is completed because i = myrndp — 1). In either case, 
getTS (6) returns without another write. Therefore, f{w) is the last write by getTS (6) 
and is not an invalidation write, so f{w) e C by definition of C. Finally, we show that 
/( •) on the domain I4 is injective. If f{w) occurs in phase ip then, as we have just seen, w 
occurs during the invisible part of phase ip + l. Suppose f{w) = f{w') where w, w' £ h. 
Then w and w' are both invalidation writes to the same register during the same phase 
if + l. But this is impossible since there can be only one invalidation write per register 
per phase. □ 

6.4. Algorithms|3]and|4]are Waitfree 

Lemma 6.14. Algorithms^and^are wait-free provided the bound M on the num- 
ber of allowed getTS method call is fixed in advance. 

Proof. Clearly, compare is wait-free. The number of registers m provided for 
getTS is at least one more than the number that can be written, so the last register 
R[m] is always _L. Since each iteration of the while-loop increments j until R[j] = _L is 
read, the while-loop can iterate at most m — 1 times. Similarly, since myrnd is the index 
of a non-_L register, the for-loop can iterate at most to — 2 times. All, operations inside 
and outside the while and for loops, except the scan, are wait-free. Hence, it remains 
to show that all calls of scan terminate within a bounded number of steps. It is imme- 
diate from the code that each getTS () writes to each register at most once, implying 
each getTS method writes fewer than to = [2\/M] times. Thus, after a finite number 
of reads during the scan, the scanning process must see no more changes to registers, 
and so will achieve a successful double collect and terminate. □ 
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7. FURTHER REMARKS 

The lower and upper bounds for long-lived and one-shot tlmestamps compare and con- 
trast in several ways. In the execution constructed in the lower bound for one-shot 
timestamps, each process that participates in a block write, takes no further steps 
in the computation. As a consequence, the proof actually applies without change if 
each register is replaced by any historyless object. The asymptotically matching up- 
per bound is, however, achieved using registers. In contrast, our proof of the lower 
bound for long-lived timestamps does not extend to historyless objects. So it remains 
an open question whether there is an implementation of long-lived timestamps from 
a sub-linear number of historyless objects. Both the long-lived and the one-shot lower 
bounds apply even to non-deterministic solo-terminating algorithms, while the asymp- 
totically matching algorithms are wait-free. 

The upper bound for one-shot timestamps applies for any bounded number of 
getTS ( ) method invocations. The covering argument in the proof of the lower bound, 
however, prevents any similar generalization: it depends on each process invoking at 
most one getTS () . 

The one-shot algorithm generalizes even to the situation where the number of 
getTS ( ) method invocations is not bounded, provided that the system could acquire 
additional registers as needed. In this case however, progress would be non-blocking 
only instead of wait-free. 
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